Multi-channel signal encoding and decoding

ABSTRACT

A multi-part fixed codebook includes both individual fixed codebooks (FC 1 , FC 2 ) for each channel and a shared fixed codebook (FCS). Although the shared fixed codebook (FCS) is common to all channels, the channels are associated with individual lags (D 1 , D 2 ). Furthermore, the individual fixed codebooks (FC 1 , FC 2 ) are associated with individual gains (g F1 , g F2 ), and the individual lags (D 1 , D 2 ) are also associated with individual gains (g FS1 , g FS2 ). The excitation from each individual fixed codebook (FS 1 , FS 2 ) is added to the corresponding excitation (a shared codebook vector, but individual lags and gains for each channel) from the shared fixed codebook (FCS).

TECHNICAL FIELD

[0001] The present invention relates to encoding and decoding ofmulti-channel signals, such as stereo audio signals.

BACKGROUND OF THE INVENTION

[0002] Conventional speech coding methods are generally based onsingle-channel speech signals. An example is the speech coding used in aconnection between a regular telephone and a cellular telephone. Speechcoding is used on the radio link to reduce bandwidth usage on thefrequency limited air-interface. Well known examples of speech codingare PCM (Pulse Code Modulation), ADPCM (Adaptive Differential Pulse CodeModulation), sub-band coding, transform coding, LPC (Linear PredictiveCoding) vocoding, and hybrid coding, such as CELP (Code-Excited LinearPredictive) coding [1-2].

[0003] In an environment where the audio/voice communication uses morethan one input signal, for example a computer workstation with stereoloudspeakers and two microphones (stereo microphones), two audio/voicechannels are required to transmit the stereo signals. Another example ofa multi-channel environment would be a conference room with two, threeor four channel input/output. This type of applications is expected tobe used on the Internet and in third generation cellular systems.

[0004] General principles for multi-channel linear predictiveanalysis-by-synthesis (LPAS) signal encoding/decoding are described in[3]. However, the described principles are not always optimal insituations where there is a strong inter-channel correlation or avarying inter-channel correlation.

SUMMARY OF THE INVENTION

[0005] An object of the present invention is to better exploitinter-channel correlation in multi-channel linear predictiveanalysis-by-synthesis signal encoding/decoding and preferably tofacilitate adaptation of encoding/decoding to varying inter-channelcorrelation.

[0006] This object is solved in accordance with the appended claims.

[0007] Briefly, the present invention involves a multi-part fixedcodebook including an individual fixed codebook for each channel and ashared fixed codebook common to all channels. This strategy makes itpossible to vary the number of bits that are allocated to the individualcodebooks and the shared codebook either on a frame-by-frame basis,depending on the inter-channel correlation, or on a call-by-call basis,depending on the desired gross bitrate. Thus, in a case where theinter-channel correlation is high, essentially only the shared codebookwill be required, while in a case where the inter-channel correlation islow, essentially only the individual codebooks are required. If theinter-channel correlation is known or assumed to be high, a shared fixedcodebook common to all channels may suffice. Similarly, if the desiredgross bitrate is low, essentially only the shared codebook will be used,while in a case where the desired gross bitrate is high, the individualcodebooks may be used.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The invention, together with further objects and advantagesthereof, may best be understood by making reference to the followingdescription taken together with the accompanying drawings, in which:

[0009]FIG. 1 is a block diagram of a conventional single-channel LPASspeech encoder;

[0010]FIG. 2 is a block diagram of an embodiment of the analysis part ofa prior art multi-channel LPAS speech encoder;

[0011]FIG. 3 is a block diagram of an embodiment of the synthesis partof a prior art multi-channel LPAS speech encoder;

[0012]FIG. 4 is a block diagram of an exemplary embodiment of thesynthesis part of a multi-channel LPAS speech encoder in accordance withthe present invention;

[0013]FIG. 5 is a flow chart of an exemplary embodiment of a multi-partfixed codebook search method in accordance with the present invention;

[0014]FIG. 6 is a flow chart of another exemplary embodiment of amulti-part fixed codebook search method in accordance with the presentinvention; and

[0015]FIG. 7 is a block diagram of an exemplary embodiment of theanalysis part of a multi-channel LPAS speech encoder in accordance withthe present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0016] In the following description the same reference designations willbe used for equivalent or similar elements.

[0017] The present invention will now be described by introducing aconventional single-channel linear predictive analysis-by-synthesis(LPAS) speech encoder, and a general multi-channel linear predictiveanalysis-by-synthesis speech encoder described in [3].

[0018]FIG. 1 is a block diagram of a conventional single-channel LPASspeech encoder. The encoder comprises two parts, namely a synthesis partand an analysis part (a corresponding decoder will contain only asynthesis part).

[0019] The synthesis part comprises a LPC synthesis filter 12, whichreceives an excitation signal i(n) and outputs a synthetic speech signalŝ(n). Excitation signal i(n) is formed by adding two signals u(n) andv(n) in an adder 22. Signal u(n) is formed by scaling a signal f(n) froma fixed codebook 16 by a gain g_(F) in a gain element 20. Signal v(n) isformed by scaling a delayed (by delay “lag”) version of excitationsignal i(n) from an adaptive codebook 14 by a gain g_(A) in a gainelement 18. The adaptive codebook is formed by a feedback loop includinga delay element 24, which delays excitation signal i(n) one sub-framelength N. Thus, the adaptive codebook will contain past excitations i(n)that are shifted into the codebook (the oldest excitations are shiftedout of the codebook and discarded). The LPC synthesis filter parametersare typically updated every 20-40 ms frame, while the adaptive codebookis updated every 5-10 ms sub-frame.

[0020] The analysis part of the LPAS encoder performs an LPC analysis ofthe incoming speech signal s(n) and also performs an excitationanalysis.

[0021] The LPC analysis is performed by an LPC analysis filter 10. Thisfilter receives the speech signal s(n) and builds a parametric model ofthis signal on a frame-by-frame basis. The model parameters are selectedso as to minimize the energy of a residual vector formed by thedifference between an actual speech frame vector and the correspondingsignal vector produced by the model. The model parameters arerepresented by the filter coefficients of analysis filter 10. Thesefilter coefficients define the transfer function A(z) of the filter.Since the synthesis filter 12 has a transfer function that is at leastapproximately equal to 1/A(z), these filter coefficients will alsocontrol synthesis filter 12, as indicated by the dashed control line.

[0022] The excitation analysis is performed to determine the bestcombination of fixed codebook vector (codebook index), gain g_(F),adaptive codebook vector (lag) and gain g_(A) that results in thesynthetic signal vector {ŝ(n)} that best matches speech signal vector{s(n)} (here { } denotes a collection of samples forming a vector orframe). This is done in an exhaustive search that tests all possiblecombinations of these parameters (sub-optimal search schemes, in whichsome parameters are determined independently of the other parameters andthen kept fixed during the search for the remaining parameters, are alsopossible). In order to test how close a synthetic vector {ŝ(n)} is tothe corresponding speech vector {s(n)}, the energy of the differencevector {e(n)} (formed in an adder 26) may be calculated in an energycalculator 30. However, it is more efficient to consider the energy of aweighted error signal vector (e_(W)(n)}, in which the errors has beenre-distributed in such a way that large errors are masked by largeamplitude frequency bands. This is done in weighting filter 28.

[0023] The modification of the single-channel LPAS encoder of FIG. 1 toa multi-channel LPAS encoder in accordance with [3] will now bedescribed with reference to FIG. 2-3. A two-channel (stereo) speechsignal will be assumed, but the same principles may also be used formore than two channels.

[0024]FIG. 2 is a block diagram of an embodiment of the analysis part ofthe multi-channel LPAS speech encoder described in [3]. In FIG. 2 theinput signal is now a multi-channel signal, as indicated by signalcomponents s₁(n), s₂(n). The LPC analysis filter 10 in FIG. 1 has beenreplaced by a LPC analysis filter block 10M having a matrix-valuedtransfer function A(z). Similarly, adder 26, weighting filter 28 andenergy calculator 30 are replaced by corresponding multi-channel blocks26M, 28M and 30M, respectively.

[0025]FIG. 3 is a block diagram of an embodiment of the synthesis partof the multi-channel LPAS speech encoder described in [3]. Amulti-channel decoder may also be formed by such a synthesis part. HereLPC synthesis filter 12 in FIG. 1 has been replaced by a LPC synthesisfilter block 12M having a matrix-valued transfer function A⁻¹(z), whichis (as indicated by the notation) at least approximately equal to theinverse of A(z). Similarly, adder 22, fixed codebook 16, gain element20, delay element 24, adaptive codebook 14 and gain element 18 arereplaced by corresponding multi-channel blocks 22M, 16M, 24M, 14M and18M, respectively.

[0026] A problem with this prior art multi-channel encoder is that it isnot very flexible with regard to varying inter-channel correlation dueto varying microphone environments. For example, in some situationsseveral microphones may pick up speech from a single speaker. In such acase the signals from the different microphones are essentially delayedand scaled versions (assuming echoes may be neglected) of the samesignal, i.e. the channels are strongly correlated. In other situationsthere may be different simultaneous speakers at the individualmicrophones. In this case there is almost no inter-channel correlation.

[0027]FIG. 4 is a block diagram of an exemplary embodiment of thesynthesis part of a multi-channel LPAS speech encoder in accordance withthe present invention. An essential feature of the present invention isthe structure of the multi-part fixed codebook. According to theinvention it includes both individual fixed codebooks FC1, FC2 for eachchannel and a shared fixed codebook FCS. Although the shared fixedcodebook FCS is common to all channels (which means that the samecodebook index is used by all channels), the channels are associatedwith individual lags D1, D2, as illustrated in FIG. 4. Furthermore, theindividual fixed codebooks FC1, FC2 are associated with individual gainsg_(F1), g_(F2), while the individual lags D1, D2 (which may be eitherinteger or fractional) are associated with individual gains g_(FS1),g_(FS2). The excitation from each individual fixed codebook FS1, FS2 isadded to the corresponding excitation (a common codebook vector, butindividual lags and gains for each channel) from the shared fixedcodebook FCS in an adder AF1, AF2. Typically the fixed codebookscomprise algebraic codebooks, in which the excitation vectors are formedby unit pulses that are distributed over each vector in accordance withcertain rules (this is well known in the art and will not be describedin further detail here).

[0028] This multi-part fixed codebook structure is very flexible. Forexample, some coders may use more bits in the individual fixedcodebooks, while other coders may use more bits in the shared fixedcodebook. Furthermore, a coder may dynamically change the distributionof bits between individual and shared codebooks, depending on theinter-channel correlation. For some signals it may even be appropriateto allocate more bits to one individual channel than to the otherchannels (asymmetric distribution of bits).

[0029] Although FIG. 4 illustrates a two-channel fixed codebookstructure, it is appreciated that the concepts are easily generalized tomore channels by increasing the number of individual codebooks and thenumber of lags and inter-channel gains.

[0030] The shared and individual fixed codebooks are typically searchedin serial order. The preferred order is to first determine the sharedfixed codebook excitation vector, lags and gains. Thereafter theindividual fixed codebook vectors and gains are determined.

[0031] Two multi-part fixed codebook search methods will now bedescribed with reference to FIGS. 5 and 6.

[0032]FIG. 5 is a flow chart of an embodiment of a multi-part fixedcodebook search method in accordance with the present invention. Step S1determines a primary or leading channel, typically the strongest channel(the channel that has the largest frame energy). Step S2 determines thecross-correlation between each secondary or lagging channel and theprimary channel for a predetermined interval, for example a part of or acomplete frame. Step S3 stores lag candidates for each secondarychannel. These lag candidates are defined by the positions of a numberof the highest cross-correlation peaks and the closest positions aroundeach peak for each secondary channel. One could for instance choose the3 highest peaks, and then add the closest positions on both sides ofeach peak, giving a total of 9 lag candidates. If high-resolution(fractional) lags are used the number of candidates around each peak maybe increased to, for example, 5 or 7. The higher resolution may beobtained by up-sampling of the input signal. The lag for the primarychannel may in a simple embodiment be considered to be zero. However,since the pulses in the codebook typically can not have arbitrarypositions, a certain coding gain may be achieved by assigning a lag alsoto the primary channel. This is especially the case when high-resolutionlags are used. In step S4 a temporary shared fixed codebook vector isformed for each stored lag candidate combination. Step S5 selects thelag combination that corresponds to the best temporary codebook vector.Step S6 determines the optimum inter-channel gains. Finally step S7determines the channel specific (non-shared) excitations and gains.

[0033] In a variation of this algorithm all of or the best temporarycodebook vectors and corresponding lags and inter-channel gains areretained. For each retained combination a channel specific search inaccordance with step S7 is performed. Finally, the best combination ofshared and individual fixed codebook excitation is selected.

[0034] In order to reduce the complexity of this method, it is possibleto restrict the excitation vector of the temporary codebook to only afew pulses. For example, in the GSM system the complete fixed codebookof an enhanced full rate channel includes 10 pulses. In this case 3-5temporary codebook pulses is reasonable. In general 25-50% of the totalnumber of pulses would be a reasonable number. When the best lagcombination has been selected, the complete codebook is searched onlyfor this combination (typically the already positioned pulses areunchanged, only the remaining pulses of a complete codebook have to bepositioned).

[0035]FIG. 6 is a flow chart of another embodiment of a multi-part fixedcodebook search method in accordance with the present invention. In thisembodiment steps S1, S6 and S7 are the same as in the embodiment of FIG.5. Step S10 positions a new excitation vector pulse in an optimumposition for each allowed lag combination (the first time this step isperformed all lag combinations are allowed). Step S11 tests whether allpulses have been consumed. If not, step S12 restricts the allowed lagcombinations to the best remaining combinations. Thereafter anotherpulse is added to the remaining allowed combinations. Finally, when allpulses have been consumed, step S13 selects the best remaining lagcombination and its corresponding shared fixed codebook vector.

[0036] There are several possibilities with regard to step S12. Onepossibility is to retain only a certain percentage, for example 25%, ofthe best lag combinations in each iteration. However, in order to avoidthat there only remains one combination before all pulses have beenconsumed, it is possible to ensure that at least a certain number ofcombinations remain after each iteration. One possibility is to makesure that there always remain at least as many combinations as there arepulses left plus one. In this way there will always be several candidatecombinations to choose from in each iteration.

[0037] For the fixed codebook gains, each channel requires one gain forthe shared fixed codebook and one gain for the individual codebook.These gains will typically have significant correlation between thechannels. They will also be correlated to gains in the adaptivecodebook. Thus, inter-channel predictions of these gains will bepossible, and vector quantization may be used to encode them.

[0038] Returning to FIG. 4, the adaptive codebook includes one adaptivecodebook AC1, AC2 for each channel. An adaptive codebook can beconfigured in a number of ways in a multi-channel coder.

[0039] One possibility is to let all channels share a common pitch lag.This is feasible when there is a strong inter-channel correlation. Evenwhen the pitch lag is shared, the channels may still have separate pitchgains g_(A11)-g_(A22). The shared pitch lag is searched in a closed loopfashion in all channels simultaneously.

[0040] Another possibility is to let each channel have an individualpitch lag. This is feasible when there is a weak inter-channelcorrelation (the channels are in-dependent). The pitch lags may be codeddifferentially or absolutely.

[0041] A further possibility is to use the excitation history in across-channel manner. For example, channel 2 may be predicted from theexcitation history of channel 1 at inter-channel lag P₁₂. This isfeasible when there is a strong inter-channel correlation.

[0042] As in the case with the fixed codebook, the described adaptivecodebook structure is very flexible and suitable for multi-modeoperation. The choice whether to use shared or individual pitch lags maybe based on the residual signal energy. In a first step the residualenergy of the optimal shared pitch lag is determined. In a second stepthe residual energy of the optimal individual pitch lags is determined.If the residual energy of the shared pitch lag case exceeds the residualenergy of the individual pitch lag case by a predetermined amount,individual pitch lags are used. Otherwise a shared pitch lag is used. Ifdesired, a moving average of the energy difference may be used tosmoothen the decision.

[0043] This strategy may be considered as a “closed-loop” strategy todecide between shared or individual pitch lags. Another possibility isan “open-loop” strategy based on, for example, inter-channelcorrelation. In this case, a shared pitch lag is used if theinter-channel correlation exceeds a predetermined threshold. Otherwiseindividual pitch lags are used.

[0044] Similar strategies may be used to decide whether to useinter-channel pitch lags or not.

[0045] Furthermore, a significant correlation is to be expected betweenthe adaptive codebook gains of different channels. These gains may bepredicted from the internal gain history of the channel, from gains inthe same frame but belonging to other channels, and also from fixedcodebook gains. As in the case with the fixed codebook, vectorquantization is also possible.

[0046] In LPC synthesis filter block 12M in FIG. 4 each channel uses anindividual LPC (Linear Predictive Coding) filter. These filters may bederived independently in the same way as in the single channel case.However, some or all of the channels may also share the same LPC filter.This allows for switching between multiple and single filter modesdepending on signal properties, e.g. spectral distances between LPCspectra. FIG. 7. is a block diagram of an exemplary embodiment of theanalysis part of a multi-channel LPAS speech encoder in accordance withthe present invention. In addition to the blocks that have already beendescribed with reference to FIGS. 1 and 2, the analysis part in FIG. 7includes a multi-mode analysis block 40. Block 40 determines theinter-channel correlation to determine whether there is enoughcorrelation between the channels to justify encoding using only theshared fixed codebook PCS, lags D1, D2 and gains g_(FS1), g_(FS2) Ifnot, it will be necessary to use the individual fixed codebooks FC1, FC2and gains g_(F1>)g_(F2). The correlation may be determined by the usualcorrelation in the time domain, i.e. by shifting the secondary channelsignals with respect to the primary signal until a best fit is obtained.If there are more than two channels, a shared fixed codebook will beused if the smallest correlation value exceeds a predeterminedthreshold. Another possibility is to use a shared fixed codebook for thechannels that have a correlation to the primary channel that exceeds apredetermined threshold and individual fixed codebooks for the remainingchannels. The exact threshold may be determined by listening tests.

[0047] In a low bit-rate coder the fixed codebook may include only ashared codebook FCS and corresponding lag elements D1, D2 andinter-channel gains g_(FS1), g_(FS2) This embodiment is equivalent to aninter-channel correlation threshold equal to zero.

[0048] The analysis part may also include a relative energy calculator42 that determines scale factors e₁, e₂ for each channel. These scalefactors may be determined in accordance with:$e_{i} = \frac{E_{i}}{\sum\limits_{i}^{\quad}\quad E_{i}}$

[0049] where E_(i) is the energy of frame i. Using these scale factors,the weighted residual energy R₁, R₂ for each channel may be rescaled inaccordance with the relative strength of the channel, as indicated inFIG. 7. Rescaling the residual energy for each channel has the effect ofoptimizing for the relative error in each channel rather than optimizingfor the absolute error in each channel. Multi-channel error resealingmay be used in all steps (deriving LPC filters, adaptive and fixedcodebooks).

[0050] The scale factors may also be more general functions of therelative channel strength e_(i), for example${f\left( e_{i} \right)} = \frac{\exp \left( {\alpha \left( {{2e_{i}} - 1} \right)} \right)}{1 + {\exp \left( {\alpha \left( {{2e_{i}} - 1} \right)} \right)}}$

[0051] where α is a constant in he interval 4-7, for example α≈5. Theexact form of the scaling function may be determined by subjectivelistening tests.

[0052] The functionality of the various elements of the describedembodiments of the present invention are typically implemented by one orseveral micro processors or micro/signal processor combinations andcorresponding software.

[0053] The description above has been primarily directed towards anencoder. The corresponding decoder would only include the synthesis partof such an encoder. Typically and encoder/decoder combination is used ina terminal that transmits/receives coded signals over a bandwidthlimited communication channel. The terminal may be a radio terminal in acellular phone or base station. Such a terminal would also includevarious other elements, such as an antenna, amplifier, equalizer,channel encoder/decoder, etc. However, these elements are not essentialfor describing the present invention and have therefor been omitted.

[0054] It will be understood by those skilled in the art that variousmodifications and changes may be made to the present invention withoutdeparture from the scope thereof, which is defined by the appendedclaims.

[0055] References

[0056] [1] A. Gersho, “Advances in Speech and Audio Compression”, Proc.of the IEEE, Vol. 82, No. 6, pp 900-918, June 1994,

[0057] [2] A. S. Spanias, “Speech Coding: A Tutorial Review”, Proc. ofthe IEEE, Vol 82, No. 10, pp 1541-1582, October 1994.

[0058] [3] WO 00/19413 (Telefonaktiebolaget LM Ericsson).

1. A multi-channel linear predictive analysis-by-synthesis signalencoder including a multi-part fixed codebook, including an individualfixed codebook (FC1, FC2) for each channel; a shared fixed codebook(FCS) containing code book vectors that are common to all channels; andmeans (40) for analyzing inter-channel correlation for dynamic bitallocation between said individual fixed codebooks and said shared fixedcodebook.
 2. The encoder of claim 1, characterized in that said sharedfixed codebook is connected to an individual delay element (D1, D2) foreach channel.
 3. The encoder of claim 2, characterized in that saidindividual delay elements (D1, D2) are high-resolution elements.
 4. Theencoder of claim 2 or 3, characterized in that each delay element (D1,D2) is connected to a corresponding gain element (g_(FS1), g_(FS2)). 5.The encoder of claim 1, characterized by a multi-part adaptive codebookhaving an individual adaptive codebook (AC1, AC2) and an individualpitch lag (P₁₁, P₂₂) for each channel.
 6. The encoder of claim 5,characterized by means for determining whether a common pitch lag can beshared by all channels.
 7. The encoder of claim 5, characterized byinter-channel pitch lags (P₁₂, P₂₁) between each channel and the otherchannels.
 8. The encoder of claim 1, characterized by means (42) forresealing the residual energy of each channel in accordance with therelative channel strength.
 9. A terminal including a multi-channellinear predictive analysis-by-synthesis speech encoder/decoder having amulti-part fixed codebook, including an individual fixed codebook (FC1,FC2) for each channel; a shared fixed codebook (FCS) containing codebook vectors that are common to all channels; and means (40) foranalyzing inter-channel correlation for dynamic bit allocation betweensaid individual fixed codebooks and said shared fixed codebook.
 10. Theterminal of claim 9, characterized in that said shared fixed codebook isconnected to an individual delay element (D1, D2) for each channel. 11.The terminal of claim 10, characterized in that said individual delayelements (D1, D2) are high-resolution elements.
 12. The terminal ofclaim 10 or 11, characterized in that each delay element (D1, D2) isconnected to a corresponding gain element (g_(FS1), g_(FS2)).
 13. Theterminal of claim 9, characterized by a multi-part adaptive codebookhaving an individual adaptive codebook (AC1, AC2) and an individualpitch lag (P₁₁, P₂₂) for each channel.
 14. The terminal of claim 13,characterized by means for determining whether a common pitch lag can beshared by all channels.
 15. The terminal of claim 13, characterized byinter-channel pitch lags (P₁₂, P₂₁) between each channel and the otherchannels.
 16. The terminal of any of the preceding claims 9-15,characterized in that said terminal is a radio terminal.
 17. Amulti-channel linear predictive analysis-by-synthesis signal encodingmethod, including the steps of analyzing inter-channel correlation; anddynamically changing, depending on the current inter-channelcorrelation, encoding bit allocation between fixed codebooks dedicatedto individual channels and a shared fixed codebook containing code bookvectors that are common to all channels.
 18. A multi-channel linearpredictive analysis-by-synthesis signal encoding method characterizedby: determining a desired gross bit rate; analyzing inter-channelcorrelation; and dynamically changing, depending on the currentinter-channel correlation and said desired gross bit rate, encoding bitallocation between fixed codebooks dedicated to individual channels anda shared fixed codebook containing code book vectors that are common toall channels.